Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
نویسندگان
چکیده
We consider variable selection problems in high dimensional sparse regression models with strongly correlated variables. To handle correlated variables, the concept of clustering or grouping variables and then pursuing model fitting is widely accepted. When the dimension is very high, finding an appropriate group structure is as difficult as the original problem. We propose to use Elastic-net as a pre-selection step for Cluster Lasso methods (i.e. Cluster Group Lasso and Cluster Representative Lasso). The Elastic-net selects correlated relevant variables, but it fails to reveal the correlation structure among the active variables. We use cluster Lasso methods to address shortcoming of the Elastic-net, and the Elasticnet is used to provide reduced feature set for the cluster Lasso methods. We theoretically explore, the group selection consistency of the proposed combination of algorithms under various conditions, i.e. Irrepresentable Condition (IC), Elastic-net Irrepresentable Condition (EIC) and Group Irrepresentable Condition (GIC). We support the theory using simulated and real dataset examples.
منابع مشابه
Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models
In this paper, we introduce Adaptive Cluster Lasso(ACL) method for variable selection in high dimensional sparse regression models with strongly correlated variables. To handle correlated variables, the concept of clustering or grouping variables and then pursuing model fitting is widely accepted. When the dimension is very high, finding an appropriate group structure is as difficult as the ori...
متن کاملA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
We consider the problem of model selection and estimation in sparse high dimensional linear regression models with strongly correlated variables. First, we study the theoretical properties of the dual Lasso solution, and we show that joint consideration of the Lasso primal and its dual solutions are useful for selecting correlated active variables. Second, we argue that correlation among active...
متن کاملVariable selection in linear models
Variable selection in linear models is essential for improved inference and interpretation, an activity which has become even more critical for high dimensional data. In this article, we provide a selective review of some classical methods including Akaike information criterion, Bayesian information criterion, Mallow’s Cp and risk inflation criterion, as well as regularization methods including...
متن کاملFIRST: Combining forward iterative selection and shrinkage in high dimensional sparse linear regression
We propose a new class of variable selection techniques for regression in high dimensional linear models based on a forward selection version of the LASSO, adaptive LASSO or elastic net, respectively to be called as forward iterative regression and shrinkage technique (FIRST), adaptive FIRST and elastic FIRST. These methods seem to work effectively for extremely sparse high dimensional linear m...
متن کاملRobust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کامل